A "Long Indel" model for evolutionary sequence alignment.
نویسندگان
چکیده
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations.
منابع مشابه
Phylogenetic Profiling of Insertions and Deletions in Vertebrate Genomes
Micro-indels are small insertion or deletion events (indels) that occur during genome evolution. The study of micro-indels is important, both in order to better understand the underlying biological mechanisms, and also for improving the evolutionary models used in sequence alignment and phylogenetic analysis. The inference of micro-indels from multiple sequence alignments of related genomes pos...
متن کاملUsing evolutionary Expectation Maximisation to estimate indel rates
Motivation: The Expectation Maximisation algorithm, in the form of the Baum-Welch algorithm (for HMMs) or the Inside-Outside algorithm (for SCFGs), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiplesequence evolutionary modeling, it would be useful to apply the EM algorithm to estimate not just the probability...
متن کاملThe Relation between Indel Length and Functional Divergence: A Formal Study
Although insertions and deletions (indels) are a common type of evolutionary sequence variation, their origins and their functional consequences have not been comprehensively understood. There is evidence that, on one hand, classical alignment procedures only roughly reflect the evolutionary processes and, on the other hand, that they cause structural changes in the proteins’ surfaces. We first...
متن کاملMCALIGN: stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution.
A method is described for performing global alignment of noncoding DNA sequences based on an evolutionary model parameterized by the frequency distribution of lengths of insertion/deletion events (indels) and their rate relative to nucleotide substitutions. A stochastic hill-climbing algorithm is used to search for the most probable alignment between a pair of sequences or three sequences of kn...
متن کاملA Poissonian Model of Indel Rate Variation for Phylogenetic Tree Inference.
While indel rate variation has been observed and analyzed in detail, it is not taken into account by current indel-aware phylogenetic reconstruction methods. In this work, we introduce a continuous time stochastic process, the geometric Poisson indel process, that generalizes the Poisson indel process by allowing insertion and deletion rates to vary across sites. We design an efficient algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 21 3 شماره
صفحات -
تاریخ انتشار 2004